Deep learning-based Fast Volumetric Image Generation for Image-guided Proton FLASH Radiotherapy

Objective: FLASH radiotherapy leverages ultra-high dose-rate radiation to enhance the sparing of organs at risk without compromising tumor control probability. This may allow dose escalation, toxicity mitigation, or both. To prepare for the ultra-high dose-rate delivery, we aim to develop a deep learning (DL)-based image-guide framework to enable fast volumetric image reconstruction for accurate target localization for proton FLASH beam delivery. Approach: The proposed framework comprises four modules, including orthogonal kV x-ray projection acquisition, DL-based volumetric image generation, image quality analyses, and water equivalent thickness (WET) evaluation. We investigated volumetric image reconstruction using kV projection pairs with four different source angles. Thirty patients with lung targets were identified from an institutional database, each patient having a four-dimensional computed tomography (CT) dataset with ten respiratory phases. Leave-phase-out cross-validation was performed to investigate the DL model’s robustness for each patient. Main results: The proposed framework reconstructed patients’ volumetric anatomy, including tumors and organs at risk from orthogonal x-ray projections. Considering all evaluation metrics, the kV projections with source angles of 135° and 225° yielded the optimal volumetric images. The patient-averaged mean absolute error, peak signal-to-noise ratio, structural similarity index measure, and WET error were 75±22 HU, 19±3.7 dB, 0.938±0.044, and −1.3%±4.1%. Significance: The proposed framework has been demonstrated to reconstruct volumetric images with a high degree of accuracy using two orthogonal x-ray projections. The embedded WET module can be used to detect potential proton beam-specific patient anatomy variations. This framework can rapidly deliver volumetric images to potentially guide proton FLASH therapy treatment delivery systems.


Introduction
Proton therapy utilizes the physics characteristics of protons, which have well-de ned ranges in a medium, to conformally deposit radiation energy to target volumes without exit doses 1,2 . This feature decreases the toxicity to healthy tissues such that patients who received proton therapy have lower risks of unplanned hospitalization compared to those who received photon treatment 3 . However, proton range uncertainty 4 requires additional margins for robust treatment planning 5,6 , which may compromise the sparing of healthy tissues when organs at risk (OAR) are closely adjacent to the treatment target. In the era of precision medicine, the critical question is how to minimize the radiation doses to OARs such that planning constraints are no longer dose-limiting for typical target prescription doses.
Favaudon et al. demonstrated that ultra-high dose-rate (≥ 40 Gy/sec) radiation, the so-called FLASH effect, (TCP) 7 , can preferentially spare normal tissues from acute radiation-induced apoptosis, while maintaining similar tumor control 8 . This promising nding can potentially make a paradigm shift in radiotherapy, and the FLASH effect has been explored in several settings since its discovery [9][10][11] . Proton FLASH therapy has been investigated regarding the feasibility of using the current commercial treatment delivery system and inverse planning [12][13][14] . Meanwhile, the e ciency and accuracy of image-guided systems are important due to high dose and ultra-high dose rate of FLASH delivery 15 . On-board fast imaging systems are essential to detect potential patient anatomy changes and for motion management, especially for patients with lung targets. However, the current proton on-board cone-beam computed tomography (CBCT) images require 30-60 seconds of scan time, and their quality can be compromised due to motion and artifacts.
Commercial proton machines, such as Varian ProBeam and IBA Proteus®ONE, include two kV x-ray sources with an image acquisition time of less than a second. The two orthogonal kV projections can potentially be acquired simultaneously, and the volumetric reconstruction method based on these projections will be free of motion and cavity artifacts. However, image reconstruction based on two projections is ill-conditioned 16 . This ill-posed problem poses a challenge for conventional image reconstruction methods. In contrast, deep learning (DL) has been demonstrated as a universal approximator 17 , and DL models feature in hierarchical learning to discover the underlying patterns behind the data 18 . A signi cant challenge of applying DL to medical volumetric image reconstruction is the identi cation of tumor regions due to information lost when superimposing three-dimensional (3D) volumetric images to 2D projections 19 .
Many researchers have investigated various DL models to reconstruct 3D volumetric images based on limited 2D information [20][21][22] . One approach uses deformable image registration techniques to register 2D and 3D images 23 . A recent development 24 integrates DL and mechanical models to achieve real-time liver tumor localization. However, the previous literature is usually based on a single x-ray projection, and the robustness of DL models in terms of CT numbers for dose evaluation remains an open question.
This study proposes a DL-based image-guide framework to inform the potential proton FLASH treatment, including tumor position and patient anatomy changes. We use two orthogonal x-ray projections to provide additional information to enhance the predictability of DL models. Most importantly, we integrate a ray tracing-based water equivalent thickness (WET) evaluation module into the proposed framework for treatment feasibility investigation. This module can speci cally detect potential anatomy changes corresponding to proton beams. To evaluate the proposed framework, we investigate under which conditions the volumetric images can be derived effectively, accurately, and robustly to support medical decision-making.
2 Materials and methods

Patient data
This work aims to develop an image-guided framework to manage patient anatomy and motion for accurate target localization before proton FLASH beam delivery. Since treatment of lung targets usually requires motion management, we identi ed 30 patients from an institutional database with 4D CT for the framework demonstration. Each 4D CT dataset includes ten respiratory phases (CT) acquired from a Siemens SOMATOM De nition AS scanner using a 120-kVp spectrum. The CT dataset for each phase has a resolution of 0.98x0.98x3.0 mm 3 . Synthetic kV x-ray projections were used to investigate the feasibility of the proposed framework, and the projections were generated based on Varian (Varian Medical Systems, Palo Alto) kV image system. The digital x-ray panel includes 768x1024 detector channels with a spacing of 0.39 mm. The synthetic kV projections were acquired from four different angle pairs, including 112°/202°, 135°/225°, 157°/247°, and 180°/270°, based on the x-ray source position.

Deep learning-based image-guided framework for proton FLASH treatment
The image-guided system is essential for proton FLASH treatment due to its ultra-high dose rate. Figure 1 depicts the DL-based image-guided framework for proton FLASH treatment, including four modules to ensure the accuracy of target dose delivery. Figure 1(a) shows the kV image system of a conventional proton treatment machine, including two digital x-ray panels acquiring orthogonal projections. Figure 2(a) displays the volumetric image generation module using two orthogonal x-ray projections. A deep learning model, InverseNet3D, was implemented to transform two orthogonal images into 3D images inversely. Section 2.2.1 gives the details of InverseNet3D regarding the model form and model parameters. Figure 3(c) shows the image evaluation module to quantify the integrity of generated volumetric images, conserving image features from reference CT such as CT number histograms, image structures, noise levels, and statistical parameters. Section 2.2.2 gives the details of each evaluation metric. Figure 4 depicts the module for treatment evaluation to detect potential patient anatomy changes. The assessment is based on WET comparisons between the reference CT and generated volumetric images, and the details of comparisons are described in Section 2.2.3. The framework performance has been evaluated regarding image quality and proton characteristics (i.e., WET). The validated framework is expected to deliver on-board volumetric images for localization and delivery of proton FLASH treatment. Figure 1(b) depicts the hierarchical structure of the InverseNet3D to transform two orthogonal x-ray projections inversely. InverseNet3D includes three components to infer volumetric images from 2D projections. Initially, the feature extractors, built by convolutional neural networks (CNN), are used to identify the local patterns from the x-ray projections. The second component includes multiple residual blocks to prevent gradient vanishing and to enhance the performance of model learning during the error backpropagation processes. Each residual block's fundamental unit is composed of two convolutional layers with a single residual layer. Ultimately, the deconvolution component upscales the dimension of local feature images received from the residual blocks to conserve the dimensions of volumetric images (CT) from patients. The detailed model structures and parameters are given in Appendix A.

InverseNet3D
To investigate the model robustness, the InverseNet3D was trained using leave-phase-out patient-speci c training with 4D CT datasets from 30 patients. Supervised loss functions were used to train the model, including mean absolute error (MAE) loss for voxel-wised image intensity learning and gradient loss for image edge learning. Tensor ow v1.15.0 25 was used to implement model hierarchy, optimization, and data preprocessing. The simulation environment included an NVIDIA Tesla V100 GPU.

Image evaluation
Three metrics were used to evaluate the quality of each volumetric image voxel generated by InverseNet3D. Eq. (1) gives the mean errors (ME) where the x, i, N, DL, and ref denote the voxel CT number, the i th voxel, total voxels, generated images by InverseNet3D, and reference images, respectively. The CT number unit is in Houns eld units (HU). ME can quantify systematic intensity shifts of the generated volumetric images from the reference. Eq. (2) de nes the MAE to determine the global quality of the generated images. Eq. (3) gives the peak signal-to-noise ratio (PSNR) to evaluate the reconstruction quality of the InverseNet3D from orthogonal 2D projections. The structural similarity index measure (SSIM), as previously described 26 , was used to quantify the structural consistency of lesion region between the reference and InverseNet3D. The histogram of CT numbers was also evaluated to investigate differences in global pro les. All evaluation metrics were implemented using MATLAB R2021a.

Treatment evaluation
Due to the ultra-high dose rate of proton FLASH therapy, patient anatomy changes are critical and may signi cantly impact treatment quality. The WET [27][28][29] can be used to quantify potential inter-fractional or intra-fractional anatomic changes during proton FLASH treatment. To maximally spare organs at risk, an anteroposterior beam is commonly used in the treatment 30 , and we explored the WET for this beam. We implemented a ray tracing-based WET algorithm 31,32 to derive the WET within the target region using MATLAB R2021a. The essential CT-number-to-relative-stopping-power (RSP) conversion table 33,34 is given in Appendix B for WET calculation.
The RSP can be derived from CT numbers using HLUT in Appendix B. Gaussian tting was used to t raw WET histograms to minimize the uncertainty due to image noises. The WET uncertainty can be caused by patient anatomy changes and inconsistent image acquisition modalities for treatment planning and daily image guidance. This work aims to investigate the WET uncertainty from an image-guided system (i.e., the quality of generated images from InverseNet3D). Eq. (4) de nes the difference of WET (∆WET) within a given region of interest (ROI) where i, N, DL, ref denote the i th voxel in the ROI, total voxels in the ROI, InverseNet3D generated images, and reference images. Eq. (5) de nes the relative difference of WET (ε WET ) within a given ROI. We focused on the target ROI in this work since the target volume is directly associated with proton beams. In order to increase tumor control probability, we need to ensure accurate dose delivery to the target volume.

Volumetric image generation using orthogonal kV projections from different angles
We investigated the framework performance using 4D CT images from 30 patients with 4 orthogonal projection pairs including 180°/270°, 135°/225°, 157°/247°, and 112°/202°. The framework synthesized CT images were evaluated using image-and proton-related metrics. Table 1 shows the results of all patients based on the projections from the source angle pair of 135°/225°, and the results indicate that each metric's standard deviations (SD) are usually slight between respiratory phases. The SD values are generally smaller than 5% of the mean values. The framework evaluation using more source angle pairs are given in Table C1-C3 in Appendix C. We systematically analyzed the phase-averaged outcomes of CT image synthesis for each patient using 2D orthogonal projections generated from 4 different source angle pairs.  Table 2 shows the patient average evaluation results of generated image quality using InversereNet3D with four orthogonal projection pairs. The 180°/270° projection pair results in the minimum ME, which is approximate − 0.3% (-3.3/1000x100%) error regarding material properties because of CT number (HU) = 1000(µ-1), where µ is the relative linear attenuation coe cient material to water. The percentage ME values for the other three orthogonal projection pairs are − 1.6%, -0.9%, and 0.4%, corresponding to the angle pairs of 135°/225°, 157°/247°, and 112°/202°.
The generated volumetric image from the 112°/202° projection results in the optimal MAE, while the 180°/270° projection causes the maximum MAE. The PSNR analyses show that both 157°/247° and 180°/270° projections have comparable 3D reconstruction image quality. The volumetric image inferred from the 112°/202° projection causes the minimal PSNR. The minimum SSIM was achieved by the image generated by using InverseNet3D with the 157°/247° projections. The volumetric images inferred from the 180°/270° projection results in the worst SSIM value. Table 2 Evaluation metrics of volumetric image quality generated from InverseNet3D using mean error (ME), mean absolute error (MAE), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM). All metrics are averaged over all patients given in Table 1 and Table C1 To demonstrate the visual results of the proposed framework using all orthogonal projection pairs, we formulated case 1 and case 2 using the synthetic image sets from patients 3 and patient 15. The selection was based on 5% (patient 3) and 95% (patient 15) percentile of PSNR results from all patients. Figure 2 demonstrates that InverseNet3D can generate volumetric images from four orthogonal projection pairs; case 1 is seen in transversal, coronal, and sagittal views. The lesion locations are marked in red for both reference and generated images. Volumetric images derived from all orthogonal projection pairs were used to reconstruct the patient's anatomy and identify the tumor in the lung. The heart and liver can be clearly identi ed from the transversal and sagittal images. Figure 3 depicts the reference and generated images for case 2, which has a smaller body and tumor size than the patient shown in Fig. 2.
The lesion target and important organs can be recognized from the volumetric images, such as the tumor, heart, aorta, vena cava, and spine. Figure 4 shows the histogram comparisons of CT numbers between the reference and InverseNet3Dgenerated images for cases 1 and 2. Figure 4 shows that the generated images from the 135°/225°p rojection pair align well with the reference histograms for both inhale and exhale scenarios. The 157°/247° and 112°/202° projection pairs exhibit a slight shift from the reference histogram. Compared to other projection pairs, apparent histogram shifts of generated images from the 180°/270° projection pair can be observed in Fig. 4.

Treatment evaluation using water equivalent thickness (WET)
The phase-averaged WET analysis results for all 30 patients with 4 projection angle pairs are given in Table 1 and Table C1-C3 in Appendix C. Table 3 summarizes the patient-averaged WET difference (∆WET) and relative WET difference (ε WET ) results calculated by Eq. (4) and Eq. (5) for generated volumetric images by InverseNet3D using multiple orthogonal projection pairs. The volumetric images from the 180°/270° projection pair result in a minimal ∆WET of -0.7 mm. However, its standard deviation is approximately twice that of images generated from the 135°/225° projection pair. The images generated from the 135°/225° projection pair also lead to the minimal standard deviations of ∆WET and ε WET with values of 3.7 mm and 4.1%. Table 3 Treatment evaluation metrics of volumetric image quality generated from InverseNet3D using the difference and relative difference of water equivalent thicknesses (∆WET/ε WET ). All metrics are averaged over all patients given in Table 1 and Table C1-C3. The ∆WET and ε WET are calculated within the target contour for an anteroposterior proton beam.

Discussion
Proton FLASH treatment can potentially create a paradigm shift in radiotherapy due to its capability to deliver ultra-high dose-rate irradiation while maximally reducing the toxicity for normal tissues. The inherent ultra-high dose rate feature makes the proton FLASH favorable to stereotactic body radiation therapy, which can help proton therapy become affordable and bene t patients by avoiding short-term and long-term side effects. As the decrease of fractionated treatment times, the prescription dose per fraction will inevitability increase such that an accurate image-guided system becomes essential. This work demonstrates a DL-based image-guided framework to generate fast volumetric images without motion artifacts due to the use of two instant-captured orthogonal x-ray projections. Acquiring kV x-ray projections is less than a second, and then the proposed framework can almost instantaneously deliver volumetric images for treatment evaluation.
In contrast, the current proton on-board CBCT 35 requires approximately 35 and 60 seconds for full-fan and half-fan scans. InverseNet3D is currently implemented in the framework. Based on the retrospective patient study, all 30 patients' anatomy can be identi ed from the generated volumetric images, including tumor tissues. The proposed framework delivers patient-averaged MAE of approximately 75 HU, which improves the image quality by at least 20% from the previous work 36, 37 . Most importantly, the framework integrates a WET analysis module for the treatment evaluation, and a patient-averaged ∆WET of ~ 1 mm can be achieved in the present work. The analyses of WET not only indicate the image quality but also monitor the potential anatomy changes on the treatment beam path, which potentially increases the usability of the proposed framework to inform proton FLASH treatment. Table 1 and Table C1-C3 provides the phase-averaged evaluation for the generated volumetric images from each patient using a leave-phase-out cross-validation method to investigate the robustness of the InverseNet3D module in the framework. Since each patient included ten raspatory phases, ten variants of InverseNet3D were trained for each patient, and a total of 300 variants of InverseNet3D were explored for a 30-patient cohort to ensure the method's robustness. Table 1 and Table C1-C3 indicates that the intrapatient standard deviations of each evaluation metric are usually smaller than 5% of their mean values. This result shows the proposed framework can consistently infer volumetric images from orthogonal xray projections. Table <link rid="tb2">2</link>-2 provide the patient-averaged image and WET evaluation to investigate the inter-patient variability. Table 2 shows that the 180°/270° projection pair yield the most considerable inter-patient image intensity variation due to the largest MAE of 80 ± 24 HU. The 180°/270°p rojection pair also results in the smallest SSIM, while the other three projection pairs have comparable SSIM values. Table 3 also shows that the 180°/270° projection pair makes the largest ε WET standard deviation of 6.8%. Meanwhiles, Table 3 indicates 135°/225° projection pair can derive the volumetric images with minimal inter-patient variation when considering the uncertainty. The 157°/247° and 112°/202° projection pairs show comparable results. Conclusively, these ndings suggest avoiding lateral x-ray projections for image inference. The lateral projections may be less informative because the projection area is smaller compared to other projection areas from different source angles. approach requires a considerable amount of numerical experiment time. Future investigation will likely focus on developing advanced validation experiments using human-mimicking phantoms and state-ofthe-art instrumentation 44 to quantify proton range uncertainty. Then the experiment data can be used to identify which DL models can work compatibly, effectively, and robustly with the proposed image-guided framework for proton FLASH radiotherapy.

Conclusions
A DL-based image-guided framework has been demonstrated for generating volumetric images using two orthogonal kV x-ray projections. The approach includes image quality and WET analyses for potential online dose evaluation and potential inter-fractional and intra-fractional anatomy changes. The proposed framework can inherently avoid motion artifacts and deliver instantaneous patient anatomy and target position to inform potential proton FLASH treatment.

Declarations
Acknowledgments This research is supported in part by the National Institutes of Health under Award Number R01CA215718 and R01EB032680.

Ethical Statement
The ethics committee, Emory Institutional Review Board (Emory IRB), had reviewed and approved this study (IRB #114349). All methods were carried out in accordance with relevant guidelines and regulations.

Informed Consent Statement
Informed consent was waived by ethics committee of Emory Institutional Review Board (Emory IRB).

Data Availability Statement
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request. Figure 1 Deep learning-based image-guided framework for proton therapy FLASH treatment, consisting of four modules: (a) kV image system with two digital x-ray panels A and B to acquire two orthogonal projections simultaneously, (b) Deep learning (DL)-based volumetric image generation using orthogonal kV projections (InverseNet3D is implemented in this module), (c) image evaluation based on CT numbers to ensure the integrity of generated volumetric images without systematic shift of voxel intensity, and (d) treatment evaluation based on WET to detector potential anatomy changes.

Figure 2
Reference (Ref.) and generated (InverseNet3D) volumetric images seen in transversal, coronal, and sagittal views for case 1 for both inhale and exhale phases using orthogonal kV projections at various source angle pairs. The window level of each image is [-1000, 200] Houns eld units (HU). The lesion ROI is marked in red.

Figure 3
Reference and generated (InverseNet3D) volumetric images seen in transversal, coronal, and sagittal views of case 2 for inhale and exhale phases using orthogonal kV projections at various source angle Page 18/19 pairs. The window level of each image is [-1000, 200] Houns eld units (HU). The lesion ROI is marked in red.